Effective SIMD Vectorization for Intel Xeon Phi Coprocessors
                    
                        
                            نویسندگان
                            
                            
                        
                        
                    
                    
                    چکیده
منابع مشابه
Fine-tuning Vectorization and Memory Traffic on Intel Xeon Phi Coprocessors: Lu Decomposition of Small Matrices
Common techniques for fine-tuning the performance of automatically vectorized loops in applications for Intel Xeon Phi coprocessors are discussed. These techniques include strength reduction, regularizing the vectorization pattern, data alignment and aligned data hint, and pointer disambiguation. In addition, the loop tiling technique of memory traffic tuning is shown. The optimization methods ...
متن کاملLattice QCD on Intel R © Xeon Phi TM coprocessors
Lattice QuantumChromodynamics (LQCD) is currently the only known model independent, non perturbative computational method for calculations in the theory of the strong interactions, and is of importance in studies of nuclear and high energy physics. LQCD codes use large fractions of supercomputing cycles worldwide and are often amongst the first to be ported to new high performance computing arc...
متن کاملEffective Barrier Synchronization on Intel Xeon Phi Coprocessor
Barriers are a fundamental synchronization primitive, underpinning the parallel execution models of many modern shared-memory parallel programming languages such as OpenMP, OpenCL or Cilk, and are one of the main challenges to scaling. State-of-the-art barrier synchronization algorithms differ in tradeoffs between critical path length, communication traffic patterns and memory footprint. In thi...
متن کاملUnderstanding the Costs of Many-Task Computing Workloads on Intel Xeon Phi Coprocessors
Many-Task Computing (MTC) aims to bridge the gap between HPC and HTC. MTC emphasizes running many computational tasks over a short period of time, where tasks can be either dependent or independent of one another. MTC has been well supported on Clouds, Grids, and Supercomputers on traditional computing architectures, but the abundance of hybrid large-scale systems using accelerators has motivat...
متن کاملAn Empirical Study of Intel Xeon Phi
With at least 50 cores, Intel Xeon Phi is a true manycore architecture. Featuring fairly powerful cores, two cache levels, and very fast interconnections, the Xeon Phi can get a theoretical peak of 1000 GFLOPs and over 240 GB/s. These numbers, as well as its flexibility it can be used both as a coprocessor or as a stand-alone processor are very tempting for parallel applications looking for new...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Scientific Programming
سال: 2015
ISSN: 1058-9244,1875-919X
DOI: 10.1155/2015/269764